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A HIGHLY OPTIMIZED NONLINEAR LEAST SQUARES METHOD 
FOR SINUSOIDAL SOUND MODELLING 



FIELD OF THE INVENTION 

The present invention relates to the sinusoidal modelling (analysis and synthesis) of mu- 
sical signals and speech. The analysis computes for a windowed signal of length JV, a set of 

io K amplitudes, phases and frequencies using nonlinear least squares estimation techniques. 
The synthesis comprises the reconstruction of the signal from these parameters. Methods 
are disclosed for three different models being- 1) a stationary sinusoidal model with arbitrary 
frequencies, 2) a stationary sinusoidal model with several series of harmonic frequencies and 
3) a nonstationary model with complex polynomial amplitudes of order P. It is disclosed 

is how the computational complexity can be reduced significantly by using any window with a 
bandlimited frequency response. For instance, the complex amplitude computation for the 
first model is reduced from 0(K 2 N) to O(NlogN). In addition, a scaled table look-up 
method is disclosed which allows to use window lengths which are not necessarily a power of 
two. 



WO 2005/055202 



2 



PCT/EP2004/013630 



BACKGROUND OP THE INVENTION 

The sinusoidal modelling of sound signals sueh as music and speech is a powerful tool for 
parameterizing sound sources. Once a sound has been parameterized, it can be synthesized 
for example, with a different pitch and duration. 

A sampled short time signal x n on which a window w n is applied may be represented by 
a model x n , consisting of a sum of K sinusoids which are characterized by their frequency 
cjfc, phase <{>k and amplitude a k) . 

K-l 

in = w n ^a k cos(2nu k ^p. + 4> k ) (l) 
. *=°. 

The offset value no allows the origin of the timescale to be placed exactly in the middle of 
the window. For a signal with length N } n 0 equals *yl. 

If the signal would be synthesized by a bank of oscillators, the complexity would be 
O(NK) with TV being the number of samples and K the number of sinusoidal components. 
As described in patent WO 93/03478, the computational efficiency of the synthesis can be 
improved by using an inverse fourier transform. However, the method requires the use of 
a window length which is a power of two and does not allow nonstationary behavior of the 
sinusoids within the window. 

In "Refining the digital spectrum", Circuits and Systems, 1996, by P. David and J. 
Szczupak, a method is described which allows to estimate the amplitudes and frequencies. 
This method relies on two spectra of which the second one is delayed in time. In addition 
the effect of the window is reduced by a matrix inversion which requires a complexity 0{K 3 ) 
for a K x K matrix. 

The amplitude estimation methods of the prior art can be categorized in two classes: 

• Sequential methods compute the parameters for each sinusoid in a sequential man- 
ner, i.e. sinusoid by sinusoid. Several methods have been claimed previously: 

1. WO 90/13887 discloses the estimation of the amplitudes by detecting individual 
peaks in the magnitude spectrum, and performing a parabolic interpolation to 
refine the frequency and amplitude values. 

2. In WO 93/04467 and WO 95/30983 a least mean squares method called analysis- 
by-synthesis/overlap-add (ABS/OLA) is disclosed for individual sinusoidal com- 
ponents. 
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The sequential methods have the advantage that they can be computed very efficiently. 
However, in case of overlapping frequency responses their result is suboptimal which 
makes that they cannot be applied when small analysis windows are used. Therefore, 
the use of large analysis windows is required. However, the definition of the model 
relies implicitly on the assumption that the amplitudes and frequencies are constant 
over the analysis window. This assumption is not valid in the case of large analysis 
windows and results in a poor quality. 

• Simultaneous methods allow to take into account the overlap between the frequency 
responses of different sinusoidal components. A method which takes into account the 
overlap allows to use smaller analysis windows and results in a better quality since the 
assumption of constant amplitude and frequency is more likely to hold. However, the 
methods of the prior art known from the literature have a high computational com- 
plexity. For instance, the time complexity for the amplitude computation of stationary 
sinusoids is 0(K 2 N) . 

There is a need for a simultaneous method for analyzing sound signals with a lower compu- 
tational complexity. 
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SUMMARY OF THE INVENTION 

The present invention relates to the modelling (analysis and synthesis) of musical signals 
and speech and provides therefore highly optimized nonlinear least squares methods. 

In section 1 an introduction to the invention is given. Three different sinusoidal models 
are presented in subsection 1.1. An overview of the nonlinear least squares methodology is 
described in section 1.2 and illustrated by Figure 1. The computational complexity can be 
reduced significantly by using a window with a bandlimited frequency response. Subsection 
1.3 describes such a window and its frequency response is illustrated by Figures 2 and 3. 

Section 2 discusses efficient spectrum computation methods for the different models and 
is Illustrated by Figure 4. 

Section 3 discloses a highly optimized least squares method for the computation of the 
complex amplitudes. First, the time domain derivation is described in subsection 3.2, which 
is transformed to the frequency domain in section 3.3. It is shown that the bandlimited 
property of the frequency response of the square window results in a band diagonal system 
matrix as depicted in Figure 5. This makes that the system can be solved in linear time 
instead of a power three complexity. The amplitude estimation algorithm is illustrated by 
Figure 6. 

Section 4 describes frequency optimization methods for the stationary nonharmonic signa, 
as there are 

1. Gradient based methods (section 4.1) 

2. Gauss-Newton optimization (section 4.2) 

3. Levenberg-Marquardt optimization (section 4.3) 

4. Newton optimization (section 4.4) 

These methods are unified in section 4.5 where two parameters Ai and A2 allow to switch 
between different optimization methods. The frequency optimization algorithm is depicted 
in Figure 7. 

Section 5 discloses the frequency optimization for the harmonic model. Efficient al- 
gorithms for gradient-based (subsection 5.1), Gauss-Newton (subsection 5.2), Levenberg- 
Marquardt (subsection 5.3) and Newton (subsection 5.4) optimization are disclosed and uni- 
fied in (subsection 5.5). The frequency optimization algorithms for the harmonic model are 
depicted in Figure 3 and Figure 9. 
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Section 6 shows that the amplitude estimation method can be expended to the complex 
polynomial amplitude model described in subsection 6.1. Subsection 6.2 discloses how the 
system matrix can be made band diagonal as is illustrated by figure 10. The complete 
algorithm is depicted by Figure 11. In subsection 6.3 it is derived how the instantaneous 
phases and amplitudes can be computed from the complex polynomial amplitudes. It is 
shown that the instantaneous frequency can be used as a new estimate of the frequency. The 
instantaneous amplitude can also be interpreted as a damped function. It is shown how the 
damping factor can be computed. 

All previous methods are based on the computation of the frequency responses by using 
look-up tables. Normally, it is desired that the window length is a power of two so that an 
FFT can be used. In section 7 it is disclosed that it is possible to use a shorter window and 
to zero-pad the signal up to a power of two length. This results in a scaling of the frequency 
responses. An illustration is provided by Figure 12. 

Section 8 describes a preprocessing routine which determines tfcie number of diagonal 
bands D that are relevant. 

Section 9 describes several applications which are facilitated by the invention, as there 

are 

1. arbitrary sample rate conversion (subsection 9.1) 

2. high resolution (multi-)pitch etimation (subsection 9.2) 

3. parametric audio coding (subsection 9.3) 

4. source separation (subsection 9.4) 

5. automated annotation and transcription (subsection 9.5) 

6. audio effects (subsection 9.6) 

Several applications are depicted in Figure 13. 
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BRIEF SUMMARY OF THE FIGURES 

Figure 1 depicts an overview of the complete nonlinear least square method for sinusoidal 
modelling. 

Figure 2 depicts the frequency responses of the Blackmann- Harris window and the first 
and second derivative of frequency response. 

Figure 3 depicts the frequency responses of the zero padded Blackmarm- Harris window, 
the frequency response of the squared window and its second derivative. 

Figure 4 depicts the optimized spectrum computation method for the harmonic and the 
nonstationary model. 

Figure 5 illustrates the band diagonal property of the system matrix B. 

Figure 6 depicts the optimized amplitude computation. 

Figure 7 depicts the frequency optimization for the stationary nonharmonic model. 
Figure 8 depicts the frequency optimization for the stationary harmonic model. 
Figure 9 depicts a subroutine of the frequency optimization for the stationary harmonic 
model. 

Figure 10 illustrates the band diagonal property of the system matrix B for the compu- 
tation of the complex polynomial amplitudes. 

Figure 11 depicts the optimized amplitude computation for the complex polynomial am- 
plitudes. 

Figure 12 depicts the theoretic motivation for the scaled look-up table. 
Figure 13 depicts the applications that axe facilitated by the invention. The applications 
that are illustrated are: 1) audio coding, 2) audio effects, 3) source separation. 
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DETAILED DESCRIPTION OF THE INVENTION 



1 Introduction 

1.1 The Signal Models 

The present invention discloses highly optimized non linear least squares methods for 
sinusoidal modelling of audio and speech. Depending on the assumptions that can be made 
about the signal three types of models are considered 

1. A model with K stationary components where each component is characterized by its 
complex amplitude A k and frequency u> k . This model is called stationary since the 
amplitudes and frequencies are constant over time. In addition, the model includes the 
analyses window w n . 

= 5ft w n > " A k exp(-2niLJk2=£&) (2) 



Wn ^2 A * exp(-27r^^) 



2. A model with S quasi-periodic stationary sound sources with a fundamental frequency 
cj A , each consisting of Sk sinusoidal components with frequencies that are integer mul- 
tiples of id*. The complex amplitude of the pth component of the kth source is denoted 
Afc tP . The window w n is taken in account. 

5-1 s k ~\ 

w » X) £ Ak * exp(-27ripw*^) (3) 

k—Q p=0 



X n = 5ft 



3. A model with K nonstationary sinusoidal components which have independent frequen- 
cies u k . The amplitudes A k ^ p denote the p-th order of the fc-th sinusoid. The window 
w n is taken into account. 

ir-i p-i 

Wn D Ap(-27rz^) J, exp(-27ricj Jt ^i) 

k=Q p=0 



= 5ft 



(4) 



1.2 A Highly Optimized Non Linear Least Squares Method 



The goal of the nonlinear least squares method consists of determining the frequencies and 
complex amplitudes for these different models by minimizing the square difference between 
the model x n and a recorded signal x n . 

AT-l 

* X>«-^) 2 (5) 
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This difference r n denned as 

r n = x n - x n (6) 

is called the residual. For a given set of frequencies, the amplitudes can be computed ana- 
lytically by a standard least squares procedure. The frequencies on the other hand cannot 
be computed analytically and are optimized iteratively. Applying the frequency optimiza- 
tion and amplitude computation in an alternating manner is called a nonlinear least squares 
method. 

Figure 1, depicts the complete analysis/synthesis method according to the embodiment 
of the invention. First, the initial values for the frequencies Uk are determined. For the 
stationary model with independent frequencies and the non stationary model, this consists 
of a simple peak picking. For the harmonic stationary sources a (multi-)pitch estimator can 
be used. 

The frequencies at iteration r are denoted u> w yielding for the initial frequencies u) (0) . 
With these initial frequencies the amplitudes A are computed. The amplitudes A and fre- 
quencies a) allow to compute the spectrum X m . When the model spectrum X m is subtracted 
from the signal spectrum X m the residual spectrum Rm is obtained. Using the residual 
spectrum Rm, the amplitudes A and frequencies Q^ r \ the frequency optimization step Aa> is 
computed which allows to compute the frequency value for the next iteration 

This iterative loop is continued until a stopping criterium is met such as 

• stop after a fixed number of iterations 

• stop after a fixed computation time 

• stop when the error function drops below a specified value 

• stop when the error change drops below a specified value 

• stop when the error function starts to increase. 

Using prior art methods, the practical applications the nonlinear least squares methods axe 
prohibited by their computational demands. The contributions which are disclosed in this 
invention are algorithms which realize significant computational gains for 

1. the spectrum computation 
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2, the amplitude computation 



3. the frequency optimization 



1.3 Window Choice 



A crucial element in order to obtain this computational gain is to choose a window with 
a bandlimited frequency response. This means that the frequency response of the window 
W(m) is assumed to be zero outside the interval —j3<m<f3. In particularly, but not 
exclusively, we consider the Blackmann-Harris window 



w 



' n = a + 6cos(2?r^) -|-ccos(47r^) + dcos(67r=^p) 



(8) 



with a = 0.35875, b = 0.48829, c = 0.14128 and d = 0.01168. The frequency response of 
the Blackmann-Harris window is shown in Figure 2. Any other window with a bandlimited 
frequency response can be applied. Throughout the description of the invention, the band- 
limited property of the frequency response of the window will play a crucial role. In addition, 
the derivatives of the frequency response are also bandlimited. Taking the derivative of the 
frequency responses is equivalent with multiplying the window with a straight line as shown 
by Eq. (9). Also the frequency response of the square window is bandlimited which can be 
understood easily taking into account that taking the square in the time domain is equivalent 
with a convolution in the frequency domain. This however, doubles, the size of the main lobe. 
These frequency responses are illustrated in Fig. 3. 

N-l 



W(m 
W\m 
W"(m 

Y(m 
Y'(m 
Y"{m 



n=0 
N-l 



n=0 
N-l 

(-2ttz^) 2 w n exp(-27rim^) 

71=0 

w 2 n exp(-27rim^p) 

n=0 
N-l 

^ (-27ri^)ii;;exp(-27rzm^p) 

n=0 
N-l 

^2 (-2ni^p-) 2 wl exp(-2mm^) 



(9) 



n=0 
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2 Spectrum Computation 



The model defined in Eq. 2 is the real part of the complex signal 

K-l 



£n = J] A k exp(-27ricj fc ^) 



(10) 



Taking the fourier transform of this complex signal results in a spectrum. X m defined as 



K-l 



(11) 



Jb=0 



where W(m) denotes the discrete time fourier transform of w n . The spectrum model X m is 
a linear combination of frequency responses of the window, which are shifted over w k and 
weighted with a complex factor A k . 

In an analogue manner one obtains for the harmonic model 



5-1 s k -i 

fc=0 p=0 



(12) 



and for the non stationary model 

N-l 

X m = 



K-lP-1 

J2 A klP (-27ri^yexp(-2muj k ^) 

w n (-27ri^yexp(-27ri(tj k + m)*=ff*) 



n=0 

K-lP-l 

fc=0 P=0 
K-l P-l 

A:=0 p=0 



exp(-27rzm^pi) 



_n=0 

'dm?' 



(13) 



The spectrum computation is illustrated in Figure 4. 
Conclusion 

When x n would be computed in the time domain this would result in a complexity 0(KN). 
However because of the bandlimited property of W(m) only ro-values must be considered 
for which — (3 < m + co k < p. As a result, the frequency response of each component can 
be computed in constant time yielding 0{K) for all components and 0(N log N) for the 
inverse fourier transforms. The reduction from O(KN) to 0(N log N) is interesting if K is 
sufficiently large. 

Also the derivatives of the frequency response are bandlimited and can be computed by 
look-up tables. This reduces the complexity from O(KPN) for the time domain computa- 
tion of the nonstationary model to 0(KP + NlogN) where the first term^ comes from the 
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spectrum computation second term from the inverse fourier transform. Since the order of 
the polynomial P is rather small, the second term predominates the complexity. 

A preferred- embodiment of the method according to the invention, comprises the com- 
putation of the spectrum as a linear combination of the frequency responses of the window 
according to Eq. (11) for the stationary nonharmonic model, Eq. (12) of the harmonic model 
and Eq. (13) for the nonstationary model, whereby only the main lobes of the responses are 
computed by using look-up tables. This method reduced the time complexity from 0(KPN) 
toO{N\ogN). 

3 Complex Amplitude Computation 

3.1 Introduction 

In this section, an efficient least mean squares technique is described for the computation 
of the complex amplitudes. In WO 90/13887, the estimation of the amplitudes is claimed 
by detecting individual peaks in the magnitude spectrum, and performing a parabolic inter- 
polation to refine the frequency and amplitude values. In WO 93/04467 and WO 95/30983 
a least means squares is presented which is applied iteratively on the signal, subtracting a 
single sinusoidal component each time. 

The major difference with the present invention is that all amplitudes are computed simul- 
taneously for a given set of frequencies. This allows to resolve strongly overlapping frequency 
responses of sinusoidal components. As will be shown later, the original computational com- 
plexity of this method is 0(K 2 N) where the K denotes the number of partials and TV the 
signal length. The invention however, solves this problem in 0(N log N) and reduces the 
space complexity, which is originally 0(K 2 ) y to O(K). 

3.2 Complex Amplitude Computation in the Time Domain 

The complex amplitude computation is derived in the time domain. Eq. (2) is reformu-r 
lated as a sum of cosines and sines where the real part of the complex amplitude is denoted 
A T k = ak cos <f>k and the imaginary part as A l k = a* sin <£ fc . The signal model for the short time 
signal x n can now be written as 




/<r-i 




(14) 
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The error function x{A\G)) expresses the square difference between the samples in the win- 
dowed signal x n and the signal model x n . 

x (i;a))=:Xi(a: n -i n ) 2 (15) 

n=0 

This notation indicates that the error is minimized with respect to a vector of variables A 
s for a given set of frequencies w that are assumed to be known. The minimization is realized 
by putting the derivatives with respect to the unknowns to zero 



d X (A;u>) d X (A;u,) _ 
' dA\ 



dA\ 



(16) 



resulting respectively in 



K-l /n-i y 

Y^ A l\Yl W n C0S(27TCU^) COS(2mJ^) 
k=0 \n=0 > 
K-l /N-l 

k r 



+ £ 4 J]) v>l sin(27ria; fc ^) cos(2™,^) J 
^ \„=o / 

N-l 

= X n W n COs(27YU)l^) 



fc=0 



(17) 



n=0 



and 



K-l /N-l \ 



*=0 



N-l 

= Y2 XnWn sin(27ro;/ z=j?&) 



K-l /N-l 

E 

k=Q Vn=0 



(18) 



n=0 



These two sets of K equations have 2K unknown variables what can be written in the 
following matrix form 

t^i i x-»i 1 at nl 

(19) 



" B l,l 


B 1,2 








' C 1 " 


B 2 ' 1 


B 2 - 2 

_ 




A' 




C 2 



WO 2005/055202 



13 



PCT/EP2004/013630 



with 

N-l 

B lk = ^^cos(27r^^)cos(27rw^) 

n=0 

n=0 
N-l 

B?£ = ^tu^cos(27r^^)sm(27ro;z^) 

n=0 
N-l 

B lk = ^^sin(27rwjbS^)sin(27ixj|2^) 

n=0 

C? = J^2: n ^ n cos(27r^^pa) 

n=0 

C? = Xn^n sin(27rw^) 

Under the condition that every sinusoid has a different frequency, the matrix B cannot have 
two linear dependent rows. Therefore, it is well conditioned which implies a unique and 
accurate solution for A. 



The computational complexity of this method is very high, for instance, 

• the computation of the matrix B has a complexity 0(K 2 N) 

• the computation of the matrix C has a complexity O(KN) 

• the solution of the linear set of equations is Q(K Z ) 

Note that the order of magnitude of K and TV is not significantly different. In the next 
sections, the complexity is reduced to 0(N log TV). 



3.3 Efficient Complex Amplitude Computation 



Several optimizations for the time-domain computation are disclosed. The main compu- 
tational burden is the construction of the matrices B and C and solving the system of linear 
equations which have complexity 0(K 2 N) and 0(K 3 ) respectively. The matrices B and C 
are expressed in terms of the frequency responses of the window W{m) and square window 
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Y(m) resulting in 

*iS = -\%{yi»k+*i)) + \*{Y{uk-u>i)) 

C? = -zffii^XmWim + unU (20) 

\ 771=0 / 

Since the window is real and symmetric, its frequency response is also real and symmetric. 

Since B 1,2 and B 2,1 are expressed in terms of the imaginary part of the frequency response, 
io they only contain zeros. By using the look-up tables for Y(m) in the computation of B the 

summation over N is eliminated resulting in a complexity 0(K 2 ) instead of 0(K 2 N). When 

C is computed, only the m- values need to be considered which fall in the main lobe of W(m) 

around cj/ reducing O(KN) to 0(K). However, solving the equations still requires 0(K 3 ). 
This can again be optimized by taking into account that B 1,1 and B 2,2 contain, only 
is significant values around the main diagonal. This property is illustrated in figure 5 for a 

single harmonic sound source but is also valid for arbitrary frequencies sorted in ascending 

order. 

When defining a matrix Y~ lik = $t(Y(co k - uji)) and a matrix Y + i >Jfc = dt(Y(u) k one 
obtains 

B 1 - 1 = |(Y + + Y-) (21) 
B 2 ' 2 = -J(Y + -Y-) (22) 

In the case of a harmonic sound source, all frequencies are a multiples of the fundamental 
frequency u/, from which follows that 

Y~ lik = R(Y((A - Qw)) 

Y + w = R(y((fc + Qw)) (23) 

Since both kuj and lu lie between zero and y, fcne * r difference lies between — y and y. By 
denoting the bandwidth of the main lobe as 2/?, and taking into account that only values 
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must be considered that lie within the bandwidth of the frequency response, it follows that 

-P<{k- l)co < (3 (24) 

As a result, only the values k-l are considered between f-fjl ^ LzjJ- Since k and 1 denote 
the row and column index of Y~, k - I denotes the diagonal. This implies that only 2D + 1 
diagonal bands must be considered with 

D = A (25) 

CO 

The number of diagonal bands is dependent on the bandwidth /? of the frequency response 
and the fundamental frequency to. For instance, when the window length is chosen to be 
three periods, to = 3, and knowing that p = 8 for the square Blackmann-Harris window, a 
value of 2 is obtained for D. This means that only the main diagonal and the first two upper 
and lower diagonals are relevant. 

On the other hand, when considering the matrix Y 4 ", the values for (k -f l)io lie between 
zero and N. The frequency response of the window is in this case divided over the left and 
right hand side of the interval When considering the left half of the response, only significant 
values are obtained when (k + l)to < /?, which yields for co = 3 that k + 1 < 2. As a result, 
only significant values are obtained in the upper left corner. For the right hand side of the 
interval, the main lobe ranges from N — (3 to N yielding, 

k + i> ^z£ (26) 
to 

Note that ^ corresponds with the maximal possible value of k + / which corresponds with 
the lower right corner of the matrix. This is illustrated in Figure 5. 

A typical method to solve a linear set of equations is Gaussian elimination with back- 
substitution. This method has a time complexity 0(K 3 ). However, since the system matrix 
is band diagonal, this method requires a time complexity 0(D 2 K). Since D is significantly 
smaller than K this results finally in O(K). ■ ■ 

In addition, the space complexity can be reduced from 0(K 2 ) to O(K) by storing only 
the diagonal bands. Therefore, shifted matrices are defined 

B 2,2 i,k = B?f+k-D ( 27 ) 

where D denotes the number of diagonals that are stored around the main diagonal. Note 
that I = 0, . . . , L— 1 and k = 0, . . . , 2D. For combinations (/c, I) resulting in an index outside 
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B, a zero value is returned. The amplitudes are computed directly from the shifted versions 
of B 1,1 , B 2 ' 2 . By denoting this routine as SOLVE this is written as 

A^SOLVE^.C 1 ) 

A} = SOLVE$&*,C 2 ) (28) 

Conclusions: 

• The space complexity of B is reduced from 0(K 2 ) to 0(K ) by storing it as B . Since 
each element is computed by a look-up table, the time complexity is also Q(K). 

• The bandlimited property of W(m), makes that the summation over m each element of 
C 1 and C 2 according to Eq. (20) can be limited to samples for which — ft < m+o; < ft. 
This implies that the computation of each element can be computed in constant time, 
yielding in 0(K) for the whole vector. 

• A second result of the band diagonal form of B is that the system can now be solved 
in O(K) instead of 0{K Z ). 

• The main computational bottleneck is the FFT for the computation of X m which 
requires a complexity 0(N log N). 

The amplitude computation is illustrated in Figure 6. 

A preferred embodiment of the method according to the invention, comprises the step of 
computing the stationary complex amplitudes, by solving the equations given in Eq. (19), 
using Eq. (20) such that only the elements around the diagonal of B are taken into account, 
whereby a shifted form B is computed containing only D diagonal bands of B according to 
Eq. (27) and Eq. (20), whereby the computation of the Eq. (20) requires the computation of 
the frequency response of the window and the square window denoted by W(m) and Y(m) 
respectively, and solving equation given by Eq. (19) directly from B and C (Eq. (28)) by 
an adapted gaussian elimination procedure. 

4. Frequency Optimization for the Stationary Model 

In this section, methods are disclosed which allow to optimize the frequency values for the 
stationary model with independent components. The signal model given in Eq. (2) is written 

as 

*» = ^n^X^( Ajtex P(" 27r ^ z ^r fl ) + A;exp(2iriw fc ==ip)) (29) 
1 *=o 
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A variety of iterative methods are known which allow to improve the frequency values w. By 
denoting the iteration index as ( r ) one obtains 

©(h-D = d) (r) + Au> > (30) 

The invention comprises methods to calculate the optimization step Aw in an efficient man- 
ner. In the following subsections it is disclosed how the computational complexity of some 
well-known optimization techniques can be reduced to 0(N log N) while their time-domain 
equivalent has a complexity 0(K 2 N). 
We consider 

1. gradient based methods 

2. Gauss-Newton optimization 

3. Levenberg-Marquardt optimization 

4. Newton optimization 

4.1 Gradient Based Methods 

A first class of optimization algorithms are based on the gradient of the error function 
defined by 

dx(P;A) 

One simple method for the optimization consists of computing the optimization step as 

Aa> = -77b (31) 

where \x is called the learning rate. When the gradient is computed for the model given in 
Eq. (29) and expressed in the frequency domain one obtains 



fH = ~R^EflmW(wi-m)^ (32) 



where Rm = X m - X m denotes the spectrum of the residual r n and W'(m) the derivative of 
the frequency response W(m). 

Conclusion ^ 
Analogue to the computation of C 1 and C 2 given by Eq. (20), the bandlimited property of 
W (m) results in the fact that only m- values within the main lobe of the response must be 
considered reducing computational complexity for the gradient from 0(KN) to O(K). 
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4.2 Gauss-Newton Optimization 

A second well-known method is called Gauss-Newton optimization and consists of making 
a first order Taylor approximation of the signal model around an initial estimate of the 
frequencies denoted as w. When making a first order approximation of the signal model 
s given by 

w„exp(-27rzwjfc^) « w n exp(-27r^ fc ^r a ) + 
the error function yields 

AT-1 j iC-l 

X(u>;A) = ^(s n - ^^(^exp^^ 

n=0 fc=0 

xo +(-27ri^) exp(-27riw fc ^) + A£ exp(27ri£*^)] (w* - w fc ))) 2 

The least square error for this function is derived by equating all partial derivatives to zero 

1^4 = 0 (33) 

This results in 

HAw = h (34) 



with 



Awj = Cji — wi 



N 



771=0 / 

H tk = ^Ay^ + ^O)-^^^^^^^)) (35) 

One can observe that the right hand side of the equation is the gradient. For the system 
20 matrix H a similar structure is observed as for the matrix B which was used for the amplitude 
computation. Again, the bandlimited property of Y"(m) implies a band diagonal structure 
for H. This implies that also in this case the time complexity can be reduced by storing H 

85 H 

Hik = Hm. k - D (36) 

25 and by computing Aw using 

Ad> = SOLVE{H } h) (37) 
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Conclusion 

Analogue to the system matrix B for the amplitude computation, the system matrix H for 
the computation of the optimization is also band diagonal. Again the set of equations can 
be solved in O(K) time. 

4.3 Levenberg-Marquardt Optimization 

When considering the system matrix H, used for Gauss-Newton optimization it is possible 
that it is poorly conditioned when the amplitudes are very small. This can be solved by adding 
the unit matrix multiplied with a factor A which is called the regularization factor. Note that 
the regularized system matrix is still bandlimited and can still be computed in O(K) time. 
Using Eq. (35), the optimization can be written as 



Since the optimization step Aa; depends on A we write it in function of it. 

The error function after iteration W is denoted by x( w ^; A) and the optimization step of 
the frequenties that was achieved with regularization factor A (r) as Aw(A^). The influence 
on the cost function for the next iteration is expressed by 



The value of A< r+1 > is adapted each iteration using A< r+1 > = A« and A< r+1 > = X^/tj. The 
choice between these updates is made by following rules; 

1. If X (^ (r) 4- Aa)(A^/? ? ); A) < xfa (r) ;^), then A< r+1 ) = X^/r} and £< r+1 > = o)< r > + 
Aw(A^A?)- 

2. If x(" (r) + AcD(A( r >/77); A) > x(^ (r) ;^), and x(w (r) + Ad)(A^);A) < x(u {r) )A), then 

A(r+D = \(r) aa d Q(r+l) = C (r) + ^(A"). 

3. Finally, when both X (^ (r) + Au(X^/tj)]A) > x(^ (r) ;^)> as x(^ (r) + AtD(A^); A) > 
x(&W]A), then X^ is multiplied by 77 until for a given q> x(&^ + &w(XWr) q )] A) < 
x(^ (r) ; A). Subsequently, A< r+1 > = A< r V and u)( r+1 > = u> (r) + Ao;(A< r V)- 



Conclusion 

Since adding a regularization term to the diagonal elements does not affect the band diagonal 
structure of H, the O(K) complexity is maintained. 



Aw(A) = SOLVE(H + A/,h) 



(38) 



x(o; (r) +Ao;(AW);A) 



(39) 
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4.4 Newton optimization 

Another commonly known method is Newton optimization which makes a second order 
Taylor approximation of the error function around 6). The minimum of this approximation 
yields the optimized values and results for the model given in Eq. (29) in 



Note that the only difference between the system matrix H for Newton and Gauss-Newton 
optimization is the additional last term. This term can be computed in constant time by 
taking in account the bandlimited property of W"(m). Again, since this term only yields non 
zero values on the diagonal, the O(K) complexity is maintained. Also, this method can be 
combined with the regularization term that is used for Levenberg-Marquardt optimization. 
Conclusion 

The system matrix for Newton optimization is band diagonal and can be regularized when 
this is desired. The O(K) complexity is maintained. 

4.5 Unifying the Optimization Methods 

Gauss-Newton, Levenberg-Marquardt and Newton optimization can be written as a uni- 
fied optimization procedure with two parameters Ai and A 2 yielding 



HAo) = h 



(40) 



with 




u)[ — u)i 




to(A k A l Y»{Cj k + £j l )) - ^{A k A\Y\Cj k - d>,)) 




(42) 
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Conclusion 

Depending on the values Ai and A2 one can switch between different methods 

1. If Ai = 0 and A 2 = 0, Eq. (42) becomes Gauss-Newton optimization. 

2. If Xi = 1 and A 2 = 0, Eq. (42) becomes Newton optimization. 

3. If Ai = 0 and A 2 > 0, Eq. (42) becomes Levenberg-Marquardt optimization. 

For each of these algorithms the band diagonal structure of the system matrix can be ex- 
ploited. The algorithm for the frequency optimization step is illustrated by Figure 7. 

A preferred embodiment of the method according to the invention, comprises the step 
of optimizing the frequencies for the stationary nonharmonic model by solving the equation 

given in Eq. (34), using Eq. (42) such that only elements around the diagonal of H are 

« — 

taken into account, whereby a shifted form H is computed containing only the D diagonal 
bands according to Eq. (36) and Eq. (42), whereby the the gradient h is computed from 
the residual spectrum i?™, amplitude Ai and frequency u>k and requires the computation 
of the derivative of the frequency response of the window W'(m), whereby the first term 
of H requires the computation of the second derivative of the frequency response of the 
square window denoted Y"(m), whereby the second term of H is computed from the residual 
spectrum Rm, amplitude A\ and frequencies Q and requires the computation of the second 
derivative of the frequency response W n {m), whereby the parameter Ai allows to switch 
between different optimization methods and the parameter X 2 regularizes the system matrix, 
and computing the optimization step by solving the the system of equations directly on H 
and h according to Eq. (37) by an adapted gaussian elimination procedure. This method 
reduces the time complexity from 0(K 2 N) to 0(N log N). 

5. Frequency Optimization for the Stationary Harmonic Model 

In the case that all sound sources produce quasi-periodic signals, a model can be used 
that takes into account this relationship between te partials, yielding 



The model consists of 5 sources each modelled by harmonic components. For this model, 
only the fundamental frequencies are optimized. The amplitude estimation is computed by 
the method disclosed in section 2, however care must be taken that different components with 




5-1 s fc -i 



(43) 



k=0 (7=0 
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very close frequencies are eliminated. The computation of the optimization of the frequencies 
takes place in an analogue manner as for the independent sinusoids. 

5.1 Gradient Based Methods 

The gradient for the harmonic model yields 

h( = axgii) ^--igsR RrtA^Wto* - m)) (44) 

1 q=l \m=0 / 

5.2 Gauss-Newton Optimization 

The system matrix for Gauss-Newton optimization results in 
s k -i $-1 

H * = EE * r [X(Ak«Ai t rY"(qu> k + rut)) - R^^y*^ - run))] (45) 
In this case, the matrix is not band diagonal and the optimization step is computed by solving 

HAu; = h (46) 

For a given value q } and a given frequency response bandwidth /?, only the r values must be 
considered for which ru>i falls in the main lobe. Since 

TV 

0<qw p <j 
N 

0<rw L < — 

the input values of Y" are bounded by 

TV N 

0 < qu) p + rui < TV 

This implies that the main lobe of Y (qu p - ru){) ranges from -/? to /?. For Y (qu) p -\- ruji) the 
main lobe is divided over the left and right side of the spectrum due to spectral replication 
yielding the intervals [0,/?] and [TV - 0,N]. This implies that for Y(qu p - ru) t ) only the r 
values must be considered for which 



-0 < qujp - rui <P 
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The two intervals for Y(quj p + rui) yield 



0 < qu p + ru)i < P 



U)l 



Ui 



and 



N -P <qu) p + ru) t <N 
N-p-quj p < ^ N-qu p 



This results finally in 



5-1 



Hi 



7*maa: ( l 



= E E Qr^{A v , q A i%T Y"(qu }p + TUH)) 

q=\ L r=l 

+ Yl qrSt(A P , q A l , T Y"(qcj p + roj l )) 

r — r mtn,2 



with 



r ~ r miTi,3 



J 



N-P-qu p 



OJi 

N -qu p 

qup-P 

uji 

qvp + P 



1 



J 



1 



J 



5.3 Levenberg-Marquardt Optimization 



Analogue as for the non harmonic model, the system matrix can be ill-conditioned in the 
case of very weak components. When this occurs, one can add the unity matrix I multiplied 
with a regularization factor A. This value can be updated as described in section 3.3. 
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5.4 Newton Optimization 

Also for the harmonic model, the system matrix for Gauss-Newton and Newton opti- 
mization are very similar. Only to the diagonal band, an additional term must be added 
yielding 

- f E E ^ 2 A P , q W"(quJ p - m) J (47) 



\ q—l m—Q 

5.5 Unifying the Frequency Optimization Methods for the Harmonic Model 

The proposed optimization methods can be unified in one set of equations using two 
parameters Xi and A 2 yielding 

HAoj = h (48) 



with 



5,-1 /N-l \ 
o=l \m=0 / 



9= 

5-1 



= E 



grSR^^y'^ + r^) 

r=l 

r tnoi,2 

r =?"mm,2 
f*max,3 

-Ai^R (E E ^9 2 ^.^"(^ -"»)]+ M» (49) 



^=1 m=0 

Conclusion 

Depending on the values Ai and A2 one obtains 

1. If A x = 0 and A 2 = 0, Eq. (49) becomes Gauss-Newton optimization. 
20 2. If Ai = 1 and A 2 = 0, Eq. (49) becomes Newton optimization. 

3. If Ai = 0 and A 2 > 0, Eq. (49) becomes Levenberg-Marquardt optimization. 
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The algorithm for the frequency optimization step is illustrated by Figures 8 and 9. 

A preferred embodiment of the method according to the invention, comprises the opti- 
mization the frequencies for the harmonic signal model, by computing the optimization step 
solving Eq. (48) using Eq. (49), whereby the gradient h is computed from the residual spec- 
trum Rm, amplitude A\ and frequencies a), and requires the computation of derivative of the 
frequency response of the window W(m), whereby the first term of H requires the computa- 
tion of the second derivative of the frequency response of the square window denoted Y"(m), 
whereby the second term of H is computed from the residual spectrum i^, amplitude Ai 
and frequencies a;*, and requires the computation of the second derivative of the frequency 
response W n {m), whereby the parameter Ai allows to switch, between different optimization . 
methods and the parameter A 2 regularizes the system matrix. 

6. Sinusoidal Modeling with Nonstationary Components 

6.1 The Model 

In many applications it is interesting to study the nonstationary behavior of the ampli- 
tudes and phases. Therefore, complex polynomial amplitudes of order P are proposed. For 
a model with K sinusoidal components this results in 
K-l p-i 

*=0 p=0 



(50) 



This can be reformulated as 



^ -TV —J. X i 

s» = w n - £ Yj A Ip [(- 27rin ^) P exp(-2™ fc ^) + (27ri^)P exp(27rw fc 3^)] + 



K-lP-l 



fc=0 p=0 



(51) 



6.2 Complex Polynomial Amplitude Computation 



The square difference between the signal and the model is written as 



N-i / - K-l P-l 



EU-^EE AT k.v [(-27rt^) p ex P (-27ri Wjt ^) + (27n^)*exp(27rz Wjt ^)] + 



n=0 \ " k~0 p~0 



»4, P [(-27ri^p) p exp(-27riw t ^) - (27r^) p exp(27riw^))) 2 (52) 
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The amplitudes are computed by taking all partial derivatives with respect to A T l q and A} g 
and equate this expressions to zero yielding 
yv-i / K-i p-\ 

£(*«- 53 I] ^fc, [(-27r^) p exp(-27r^^) + {2iri^Y eM^^k^)] + 

n=0 \ fc=0 p=0 

i^ iP [(-27ri^) p exp(-27rMJ fc ^) - (27n^) p exp(27rw t ^)]) 

[(-ZTri^p) 9 exp(-27ria;,^) + (2tu^)« exp(27rio,,^)] J = 0 (53) 

and 

n-\ / . K-l P-l 

^2 ( *n - w n - J2 J2 A *,p [(-2ni^) p eKp(-2imj k *=p) + (2m^y exp(27rw fc ^)] + 

n=0 \ fc=0 p^O 

iAt jP [(-27rz^) p exp(-27r2^^) - (2m^) v exp(27riio k ^)}) 

(j- iv) n^ K-Zni^y exp(-27r^^) - (2m^) q exp(27r^^)]^ = 0 (54) 

10 This results in 2KP equations which allow to determine the 2KP unknowns. 

As a result, the system matrix has a size 2KP x 2KP. Analogue to the system matrix for 
the amplitude computation B, the system matrix can be divided in four quadrants denoted 
B 11 , B 1 ' 2 , B 2 ' 1 and B 2 ' 2 yielding 



B 11 


B 1,2 




" A 1 " 




' C 1 ' 


B 2,1 


B 2 ' 2 




A 2 




c 2 



WO 2005/055202 



27 



PCT/EP2004/013630 



with 



n-i 



^qX+l^pK+k 



pl,2 



d2,1 



.2,2 

lqK+l }P K+k 



C qK+l 



= 7E W « l(- 27ri ^) P exp(-27riw fc n? ft ) + (27ri^) p exp^Tri^^)] 



n=0 



[(_27ri^) 9 exp(-2jricj|^) + {2m^) q exp(27riw t ^)] 



= i >n [(-27ri^) p exp(-27rio; fc ^) - (27ri^) p exp(27r«j fc ^p)] 



n=0 



exp(-27rw i ; l ^) + (27ri2^*)« exp(27rtu>, ^)] 



. N-l 



lJ2 w » K~ 2lTi ^y ^exp(-27Tiw fe ^) + (27ri^) p exp(27riu; fc ==^)] 



n=0 



[(-27ri==p) , exp(-27riw,^p) - (2m^) q exp(27ria; / ^)] 



= - \ wl [(-27rz^) p exp(-27rw fc ^) - (27u^) p exp(27na; fc =9*)] 



Tl=0 



[(_27ri^)«exp(-27ria;i^i) - (27ri^) ? exp(27rtw,^)j 

N-l 



]T x n w n ± ((-27ri^)«exp(-2 7 ra; i ^p) + pm^)* exppnu,^)] 
i J2 ^w n - [(-27ri^)*exp(-27ru^) - (27ri^) 9 exp(27ra; i ^)] 



pK+k 
^>K+k 



= Ai 



k,p 



(56) 



The real and imaginary part of the frequency response and its derivatives can be expressed 
using 

- N-l l N-l 

»pr(m)] = -^^nexp(27ri^) + -^^exp(-27rzm^) 



amp 



n=0 

n-i 



n=0 



»[y(m)] = |^u£(27ri^) p exp(27nm^) + 



n=0 
N-l 



\ YL ^n(-27rz^) p exp(-27rim^) 



(57) 



n=0 



am? 



n-i 



71=0 

N-l 



n=0 



n=0 
N-l 



^ X^^(-27ri^) p exp(-27rzm^p) 

n=0 



(58) 
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from which follows that the expressions of Eq. (56) can be transformed to 



pl>2 

n qK+l tP K+p 



n qK+l,pK+p 



VqK+lvK+p 



r 2 

^qK-H 



1 T SP** 

_i r 



+ (-i) 



, „ , &[F(m)]l 

QP+<1 "1 



- (-!)<■ 



77V=a;fc4-u>A: 



Qp+q 



»[r(m)]l 

■apr(m)j| 

(y(m)]] 



SR[y(m)] 



= » 



\ 771=0 



d q 



dm?** 
1 T S^ 9 

i r a p+g i 



<9< 



771=0 



4 r 



(59) 



The vectors C and matrices B are now expressed in terms of the frequency response of the 
windows and the square window respectively. Each (p, g)-couple denotes a submatrix of the 
matrices of size KxK. Prom the bandlimited property of $l[Y (m)] and its derivatives follows 
that these submatrices of B 1,1 and B 2,2 are band diagonal. In an analogue manner, since 
SjY(m)] and its derivatives always yield zero, the submatrices B 1,2 and B 2,1 contain only 
zeros. This structure is depicted at the top of Figure 10. 

The upper left and lower right kwadrants contain band diagonal submatrices for each 
(p, <7)-couple. This implies that all relevant values are stored at positions defined by a quadru- 
ple (Z, <7, k } p) for which the following conditions hold: 



-D < k - I < D 
0<p<P-l 
0<q<P-l 

The inequalities given in Eq. (60) can be transformed to 

—DP < (k - l)P < DP 
0 <p < P- 1 
~(P - 1) < -9 < 0 



(60) 



(61) 
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from which follows that 

—{D + 1)P + 1 < (kP + p) - {IP + q) < (D + 1)P - 1 (62) 

By inverting the indexation order, i.e. using (kP + p,lP + q) instead of (pK + qK + /), 
one obtains for the row index kP 4- p and for the column index IP -f q. Since their difference 
denotes the index of the diagonal, it follows from Eq. (62) that all relevant values lie around 
the main diagonal. This is illustrated by the lower part of figure 10, A a result, the definition 
of the system of equations after inversion of the indexation becomes 




d q \ 
W(m 4- £J/) J 



Aip+p = (63) 

By using a look-up table for each derivative of the frequency response each element can be 
computed in constant time. Since B 1,1 and B 2,2 are band diagonal they can be stored in a 
more compact form containing only the relevant diagonal bands, yielding 

d1,i _ pi,i _ pi,i 

° — n lP+q,lP+q+kP+p-{D+l)P±l - n lP+qXk+l-D)P+{p+q-P+l) 

R2,2 _ p2,2 _ p2,2 /a , 

with p and q ranging from 0 to P - 1, I ranging from 0 to K - 1, and k from 0 to 2D. 
Conclusion 

A least squares method is derived which allows to analyse non stationary sinusoidal com- 
ponents defined by Eq.(50). This model for a windowed signal of length N, consists of K 
sinusoidal components with complex polynomial component of order P. When the equa- 
tions are solved in the time domain the computation of the system matrix has a complexity 
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0{{KP) 2 N) and solving the equations a complexity 0((KP) 3 ). By using the band diagonal 
property of the submatrices and rearranging the index so that all relevant values lie close to 
the main diagonal the complexity can be reduced to 0(KP(DP) 2 ). Generally, the order of 
the polynomial and the number of diagonal bands is quite small relative to the number of 
components K and number of samples N, 

A preferred embodiment of the method according to the invention comprises the step of 
computing the polynomial complex amplitudes by solving the equation given in Eq. (55), 
using Eq. (56) such that only the elements around the diagonal of B are taken into account, 
whereby a shifted form B is computed containing only PD diagonal bands of B according 
to Eq. (64) and Eq. (56), whereby the computation is required of the frequency response of 
the square window and its derivatives ^Y(m) } whereby the computation is required of the 
frequency response of the window and its derivatives ^W(m) y and solving the equation 
given by Eq. (55) directly from B and C by an adapted gaussian elimination procedure. . 
This method reduced the complexity from 0{{KPf) to (D(KP(DP) 2 ). 

6.3 Model Interpretation 

The fact that amplitudes are complex polynomials makes them awkward to interpret. It 
is more convenient to interpret the sinusoidal model in terms of instantaneous amplitudes, 
phases and frequencies. Therefore, the model given by Eq. (50), is written as 



(65) 



l_fc=o 



and reformulated using 




(66) 



resulting in 



r/f-i 



(67) 



This equation can now be written as 




(68) 



with 




2 



(69) 
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where ^k( n ) au d are called respectively the instantaneous amplitude and frequency of 

each partial k. To simplify the notation, a r (n) and a { (n) are defined as 

al.(n) = £4,(-27r^)> 
p-1 

4(n) = ^ i U- 2 ^) P (70) 
The instantaneous amplitudes, phases and their derivatives can now be written as 



d* k {n) ^ al{n)o%{n) + a\{n)a$(n) 



dn 



^(n) 2 + 4(n) 2 



= KfrP + 4(n) 2 ) 3 / 2 (Kr(n)2 + a * (n) ^ (n) + ^ + 
K(n) 2 + " K*(nK(n) + 4(n)a£] 2 ) 

$k(n) = 2mu k ^i 4- * arctan -ff-£ 

gjW = | ^(n)ag(n)-at(n)g{r(n) 

9n ^ Oi r k (n) 2 + <4(rc) 2 

- *7 r i X2([«J(nK' < (n)-4(n)ar(n)]K(n) 2 + ai(n) 2 ] 

dU («S(n) 2 + 4(n) 2 ) 

+2 [aUn)ajT(n) + <4(«K(»)] [«i(»K(n) - c£(nK(n)]) 
At n 0 > the derivatives of a r (n) and or*(n) yield 



a£(n 0 ) - A r k 



*,0 

_2rr /jr 



L J n=no 



=no 

4(n 0 ) = 4 



Ar.O 

1 Jb,l 



^ (n0)= [|^^ (n) ] n -no = 2 ^ )2ilfc ' 2 
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resulting for the instantaneous amplitudes and frequencies and their derivatives at n 0 



d* k (ny 



dn 



dn 2 



n=no 



n=no 



- -(*) 



2(*) 



*,0 ^ ^fc.O 



A l,l + ^fc.l + 2^,0^1,2 + 2^,0^1,2] 



(71) 



$k( n o) ~ zarctan 

r a$ fc (n) i 

L J n=no 
L ^ J n= 



/Jr >jt _ Ai AT 
- 2 ; — 



(72) 



Note that the first derivative of the phase is the instantaneous frequency at no- This can be 
used for an iterative optimization of the frequency u) k yielding 

,.H-0 _ (r) , a x Ar k } o A k t i ~ ^fc.o^M 

In addition, the amplitude derivatives evaluated at n 0 define a second order approximation 
of the instantaneous amplitude around n 0 . 



3*fc(n) 



(n - n 0 ) + 



1 \d 2 * k (n) 



j n=no 



It 



dn 2 



(n - n 0 ) 2 



(74) 



n=no 



In the case that the amplitudes are exponentially damped, as frequently occurs for percussive 
sound, one can equate 



dn 



n=no 



Aex.p(p k (n - n 0 )) « #fc(n 0 ) + 
By evaluating both members for no one obtains 

A k « *t(no) 



(n - no) + ^ 



d 2 #A:(n) 



9n 2 



(n-no) 2 (75) 



(76) 



By taking the derivatives of both members and evaluating the expressions for n 0 one obtains 

"S*fc(n)" 



9n 



(77) 



J n=no 



WO 2005/055202 



33 



PCT/EP2004/013630 



The damping factor p can be determined from the two previous equations and Eq. (71), 
resulting in 

Pk « ^») ^H±iffi (78) 

Conclusion 

A preferred embodiment of the method according to invention, comprises the step of comput- 
ing the instantaneous frequencies and the instantaneous amplitudes according to Eq. (69), 
whereby the instantaneous frequency can be used as a frequency estimate for the next iter- 
ation as expressed in Eq. (73). In addition, the method comprises the step of computing 
damping factor according to Eq. (78), in case that the amplitudes are exponentially damped. 

7. Adaptation to Variable Window Lengths 

The FFT requires that the window size is a power of two. However one can desire to use 
a window length which is not a power of two. For that case, a scaled table lookup method is 
disclosed which allows to use arbitrary window lengths which are zero padded up to a power 
of two. First, a theoretical motivation is given which is represented in Fig. 12. The fourier 
transform of a window with length M is denoted as yielding 

W M {m - mo) (79) 

When the window is zero padded up to a length N we obtain a new frequency response 
denoted as W$(m) which can be expressed as a scaled version of W M (m) yielding 

W$(m - n 0 ) = W M (^m - mo) (80) 

where m now ranges from 1 to TV - 1. As a result, the spectral bandwidth of the frequency 
response is enlarged to — j^P < m < jjjP. 

In the next step, the spectrum is truncated to a length N' and the inverse fourier transform 
is taken resulting in 

<(«-no) = f™ M (7^-m 0 ) (81) 
where the rescaled window size is given by M' = M^. The combination of time domain zero 
padding and frequency domain truncation allows to express a normalized window j^w^ t (n- 
n f 0 ) with length M' zero padded up to a length N* in function of W M (m) using 

—w»< (n - ri Q ) = - 2^ W M (—m - m 0 ) exp(27rz °— ) (82) 

m=0 
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For the practical implementation, the oversampled main lobe of W(m) is stored in a table 7}. 
The parameters that are required to. compute the variable length frequency response given 
in Eq. (82) are 

• M: window length used to compute the look-up table 

• N f : desired FFT size 

• M': desired window size 

The table has a length i L and the first index i of the table is denoted i 0 . These index values 
correspond with the m-values over a range [m a ,m b ]. This leads to the following relation 
between the input value m and index i 

rn = m a + (m b - m a ) - — ~- (83) 

2=l0 + ( lL _1) -JL (84 

The values of W(m) are obtained by a simple linear interpolation between the closest i-values 
yielding 

W(m) = (i- LiJ)2l<j + (l-t+ L<J)Tl<j+i (85) 

where i is computed from m using the previous formula. 

When a window with length M 1 is taken which is zero padded up to a length N\ the 
main lobe is enlarged up to a size 2^/?. Therefore, the synthesis of a frequency u>k (see Eq. 
??) requires the computation for all frequency domain samples m for which' 

™>min < ™> < mmax 

with 

N' 

N' 

m max = [u k -f —0\ (86) 

Conclusion 

All previously described algorithms can be adapted to allow arbitrary window lengths zero- 
padded up to a power of two. Eq. (82) shows that a zeros padded window can be computed 
by scaling its frequency response. Note that for the derivatives of the frequency responses 
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this scaling must be taking into account. Another result is that the width of the frequency 
response is enlarged as expressed by Eq. (86). 

A preferred embodiment of the method according to the invention, comprises a method 
to compute the frequency response of a window with length M zero padded up to a length 
N by using a scaled table look-up according to Eq. (82). 

8 Amplitude Computation Pre-processing 

The goal of the pre-processing before the amplitude computation is twofold. On one 
hand the frequencies are sorted in order to obtain a band diagonal matrix for B. In addition, 
frequencies that occur twice result in two exact rows in B making it a singular matrix. 
Therefore, no double frequencies are allowed for the frequency computation. 

On the other hand, the preprocessing determines how many diagonals of the matrix B 
must be taken into account. This is done by counting the number of sinusoidal components 
that fall in the main lobe of each frequency response. The maximum number of components 
over all frequency responses yields the value for D. 

9 Applications 

The computational improvement of the method according to the invention facilitates a 
large number of applications such as; arbitrary sample rate conversion, multi-pitch extraction, 
parametric audio coding, source separation, audio classification, audio effects, automated 
transcription and annotation. 

Several applications are depicted in Figure 13. 

9.1 Arbitrary Sample Rate Conversion 

In section 7 it was shown that the window length can be altered by scaling the frequency 
response of the sinusoidal components. The fourier transform itself is sinusoidal representa- 
tion of a sound signal where the frequencies are given by 

^ = ^ (87) 

with k = 0, . . . , N - 1. When the Blackmann-Harris is applied, the amplitudes for all these 
frequencies can be determined by the optimized amplitude estimation method presented in 
section 3. 
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When the window size is enlarged by a factor a and the frequencies are divided by the 
same factor, a resampling of the signal is obtained. The resampling factor a can be any real 
number and results therefore in an arbitrary sample rate conversion. 

9.2 High Resolution (Multi)Pitch Estimation 

The efficient analysis method will. improve pitch estimation techniques. Current (multi)- 
pitch estimators based on autocorrelation such as the summary autocorrelation function 
(SACF) and the enhanced summary autocorrelation function (ESACF), allow to estimate 
multiple pitches. However, none of these methods takes into account the overlapping peaks 
that might occur. The frequency optimization for harmonic sources which/is presented in this 
invention allows to improve the fundamental frequencies iteratively leading to very accurate 
pitch estimations. In addition, very small analysis windows can be used which enable to 
track fast variations in the pitch in an accurate manner. 

9.3 Parametric Audio Coding 

The resynthesis of the sound is of a very high quality which is indistinguishable from 
the original sound. In addition, the amplitudes and frequency parameters vary slowly over 
time. Therefore, it is interesting to apply our method in the context of parametric coders 
where these parameters are stored in a differential manner what results in a considerable 
compression. Evidently, this is interesting for the storage, transmission and broadcasting of 
digital audio. 

9.4 Source Separation 

When a multipitch estimator provides good initial values of the pitches the method op- 
timizes all parameters so that an accurate match is obtained. By synthesizing each pitch 
component to a different signal, the sound sources in the polyphonic recording can be be 
separated. 

9.5 Automated Annotation and Transcription 

Fast variations in the amplitudes A and frequencies u> indicate the beginning and end of a 
note. Therefore the method will contribute to the automatic annotation and/or transcription 
of the audio signal. 
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9.6 Audio Effects 

By modifying the frequencies and amplitudes of the different sinusoidal components high 
quality audio effects can be achieved. The power of this method lies in the fact that fre- 
quencies and amplitudes can be manipulated independently. This allows for instance time- 
stretching, sound morphing, pitch changes, timbre manipulation etc. all with a very high 
quality. 
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DETAILED DESCRIPTION OF THE FIGURES 

Figure 1 depicts the complete Analysis/Synthesis method according to the embodiment 
of the invention. Starting from a windowed short time signal x n (1) and its fourier transform 
(2) X m (3) the initial values of the frequencies (5) are computed (4). These frequencies 
(5) are then pre-processed (6) and the number of diagonal bands D (7) is determined. The 
amplitudes (11) are computed from X m , the number of diagonal bands (7) and the pre- 
processed frequencies (8). The amplitudes (11) and frequencies (8) are used to calculate 
the spectrum X m (13). The difference (14) between the synthesized spectrum Xm, (13) and 
the original spectrum X m (3) yields the residual spectrum (16). This residual spectrum 
(16), the frequencies (8) and amplitudes (11) are used to optimize (9) the frequency values 
(5) for the next iteration. A stopping criterium evaluator (17) determines whether the loop 
is continued. Several criteria were described in section 1.2. When the criterium is met, the 
iteration is terminated (18). The time-domain model x n is obtained by taking an inverse 
fourier transform (19) of the spectrum X m (13). A short notation is depicted (20) which 
takes as input the signal x n and produces a synthesized signal x ni the amplitudes A and 
frequencies u. 

Figure 2 illustrates the band limited property of respectively W(m) (top), W f (m) (middle) 
and W"(m) (bottom). On the left they axe represented on the linear scale. On the right they 
represented on the dB scale. 

Figure 3 illustrates frequency response of the zero padded Blackmann-Harris window 
W^{m) (top), the squared Blackmann-Harris window Y(m) (middle) and its second deriva- 
tive Y"(m) (bottom). Also these frequency responses are band limited and are shown on the 
linear scale on the left, and on the dB scale on the right. 

Figure 4 depicts the detail of the spectrum computation. On the left hand side the 
computation is given for the harmonic model. For each sound source k ranging from 0 to 
S - 1 (21) , and each component p ranging from 0 to S k - 1 belonging to this source (22) , the 
range of m- values is determined (23). Then, for each m- value (24) the frequency response 
W(m) is computed and multiplied with the amplitude (25). On the right hand side the 
spectrum computation is shown for the nonstationary model is shown. For each component 
indexed by k and ranging from 0 to if - 1 (26) the range of spectrum samples m is computed 
(27). Then, for each order p ranging from 0 to P- 1 (28) and each spectrum sample m (29) 
the frequency of the pth derivative of the frequency response W(m) is computed, multiplied 
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with the amplitude A ktP and added to the spectrum X m (29). (30) shows a short notation 
for the spectrum calculator. 

Figure 5 illustrates the band diagonal property of the system matrix B that is used for 
the amplitude computation. As described previously, the matrices B 1 ' 1 and B 1,1 can be 
written in terms of two matrices Y + (33) and Y" (32) as indicated by (34). The index k 
denotes the column of the matrix and / the row. This implies that k — I and k + I indicate 
respectively the diagonal and antidiagonal of the matrix. By multiplying the diagonal index 
with the fundamental frequency, the input value for the function Y(m) is obtained which 
denotes the frequency response of the square window (31). The space complexity is reduced 
by* storing only the relevant diagonals in a 'shifted matrix' B u (35). 

Figure 6 depicts the detail of a method of computing the amplitudes of the sinusdoidal 
components in a sound signal in 0(N log N) time, according to the invention. The amplitudes 
A (44) are computed from a spectrum X m for a given set of frequencies Q. This is realized 
by constructing the matrices C 1 , C 2 (40) and the matrices B 11 , B 2,2 (42) according to 
Eq. (20). By solving the set of equations represented by these matrices the amplitudes are 
computed (44). The vectors C 1 and C 2 are computed by determining for all partials I (36) 
the range of m values (37), (38) of the main lobe and computing the value for each m-value 
(40) according to Eq. (20). For the matrices B 1 ' 1 and B 2 > 2 , the shifted matrices B 1 ' 1 and 
B 2,2 are computed containing only the band diagonal elements. The width of the band is 
denoted D, For all k values from 0 to 2D (41) each row of the matrices B 1 ' 1 and B 2,2 is 
computed (42) according to Eq. (20). The equations denoted in Eq. (19) can now be solved 
directly on the shifted versions of B M , B 2 ' 2 , (43) yielding the amplitude values (44). A 
short notation for the computation is denoted by (45). 

Figure 7, depicts the frequency optimization for the non harmonic model according to the 
embodiment of the invention. It shows how the gradient and system matrix are computed 
for different optimization methods as described in section 4. For each sinusoidal component 
(46), the relevant range of spectrum samples m is determined (47). Over this range (48), 
the gradient elements and the diagonal elements of the system matrix are computed (49) 
according to Eq. (41). Then, all diagonals k (50) of the system matrix are computed (51) 
according to Eq. (41). In addition, a regularization term is added to the diagonal elements 
(51) according to Eq. (38). The optimization step (54) is computed by solving the set of 
equations (53). A short notation is denoted by (55). As follows from Eq. 42, the parameters 
X { and A 2 allow to switch between different optimization methods and allow to regularize the 
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system matrix. 

Figures 8 and 9 depict the frequency optimization for the harmonic model according to the 
embodiment of the invention. For each sinusoid q (57) of a source I (57), the relevant range 
of spectrum samples m is determined (58). This range is used (59) for the computation of 
gradient h and diagonal elements of the system matrix H (60) according to Eq. 49. In a 
subroutine (61), (66) the other elements of H are computed. For each matrix column k (67), 
the ranges of r-values are determined (68, 71, 74) and matrix elements are computed (70, 
73, 76) over these values (69, 72, 75), according to Eq. (49). After the subroutine (77, 
62), the regularization term A2 (63) is added to the diagonal values. Finally the optimization 
step A(w) (65) is computed by solving the equations (64). 

Figure 10 shows the band diagonal submatrices for each (p.g)-couple. All relevant values 
are positioned around the main diagonal by inverting the indexation order. 

Figure 11 depicts the embodiment of the the polynomial amplitude computation as defined 
in Eq. (56). For each component I (78) the range of m-values is determined (79). The values 
C 1 and C 2 are computed (82) by iterating over q (80) and m (81). The diagonal bands 
of B 11 and B 2 * 2 are computed (85) and stored in B 1,1 and B 2>2 by iterating over I (78), 
V (83), q (80) and k (84). Finally, the complex polynomial amplitudes are computed by 
solving the equations (86). 

Figure 12 illustrates the theoretic motivation for a scaled table look-up. A time domain 
window of length M, denoted by w M (n) (87) is considered for which the frequency response 
(90) is bandiimited within a range [-/?,/?]. When this window is zero padded up to a 
length N (88) this results in a scaling in the frequency domain (91). Then, the spectrum is 
truncated (92) resulting in a length N'. When taking the inverse fourier transform of this 
truncated spectrum, a window with length M l zero padded up to a length N* is obtained 
(89). 

Figure 13 shows several applications of the analysis method according to the embodiment 
of the invention. The top of the figure illustrates the application of the invention (93) in 
the context of parametric/sinusoidal audio coding. At the sender side, the amplitudes A, 
frequencies G) and noise residual r n are encoded (94) in a bitstream (95) which can be 
stored, broadcasted or transmitted (96). At the receiver side, the decoder (97) computes 
the amplitudes A, frequencies Q and noise residual r n back from the bitstream. Subsequently, 
the spectrum is computed (98) and by taking the IFFT (99) and adding the noise residual 
(100), the signal model is computed (101). 
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In the middel of the figure, it is shown how the invention C102) facilitates advanced audio 
effects. The parameters A, <D and the noise residual r n are processed by an effects processor 
(103) yielding the processed values A*, a)* and r* (104). With these values, the spectrum 
is computed (105), an IFFT is taken (106) and the modified residual r* is added (107), 
s resulting in the modified signal (108). 

At the bottom of the figure, the application of the invention (109) is depicted in the 
context of source separation. A source demultiplexer (110) classifies all component by their 
sound source (111). By computing the spectrum (112) and taking the inverse transform 
(113), the different sources are synthesized separately (114) . 



